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Abstract 

We consider offline scheduling algorithms that incorporate speed scaling to address the bi- 
criteria problem of minimizing energy consumption and a scheduling metric. For makespan, we 
give linear-time algorithms to compute all non-dominated solutions for the general uniprocessor 
problem and for the multiprocessor problem when every job requires the same amount of work. 
We also show that the multiprocessor problem becomes NP-hard when jobs can require different 
amounts of work. 

For total flow, we show that the optimal flow corresponding to a particular energy budget 
cannot be exactly computed on a machine supporting arithmetic and the extraction of roots. 
This hardness result holds even when scheduling equal-work jobs on a uniprocessor. We do, 
however, extend previous work by Pruhs et al. to give an arbitrarily-good approximation for 
scheduling equal-work jobs on a multiprocessor. 

1 Introduction 

Power consumption is becoming a major issue in computer systems. This is most obvious for 
battery-powered systems such as laptops because processor power consumption has been growing 
much more quickly than battery capacity. Even systems that do not rely on batteries have to 
deal with power consumption since nearly all the energy consumed by a processor is released as 
heat. The heat generated by modern processors is becoming harder to dissipate and is particularly 
problematic when large numbers of them are in close proximity, such as in a supercomputer or a 
server farm. The importance of the power problem has led to a great deal of research on reducing 
processor power consumption; see overviews by Mudge [Tl], Brooks et al. 0, and Tiwari et al. |19j . 
We focus on the technique dynamic voltage scaling, which allows the processor to enter low-voltage 
states. Reducing the voltage reduces power consumption, but also forces a reduction in clock 
frequency so the processor runs more slowly. For this reason, dynamic voltage scaling is also called 
frequency scaling and speed scaling. 

This paper considers how to schedule processors with dynamic voltage scaling so that the 
scheduling algorithm determines how fast to run the processor in addition to choosing a job to 
run. In classical scheduling problems, the input is a series of n jobs Ji , J2, ■ ■ ■ , J n - Each job Jj 
has a release time ri, the earliest time it can run, and a processing time pi, the amount of time 
it takes to complete. With dynamic voltage scaling, the processing time is not known until the 
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schedule is constructed so instead each job Jj comes with a work requirement iUj. A processor 
running continuously at speed a completes a units of work per unit of time so job Jj would have 
processing time Wija. In general, a processor's speed is a function of time and the amount of work 
it completes is the integral of this function over time. This paper considers offline scheduling, 
meaning the algorithm receives all the input together. This is in constrast to online scheduling, 
where the algorithm learns about each job at its release time. 

To calculate the energy consumed by a schedule, we need a function relating speed to power; the 
energy consumption is then the integral of power over time. Actual implementations of dynamic 
voltage scaling give a list of speeds at which the processor can run. For example, the AMD Athlon 
64 can run at 2000MHz, 1800MHz, or 800MHz pQ. Since the first work on power-aware scheduling 
algorithms [2^; however, researchers have assumed that the processor can run at an arbitrary speed 
within some range. The justification for allowing a continuous range of speeds is twofold. First, 
choosing the speed from a continuous range is an approximation for a processor with a large number 
of possible speeds. Second, a continuous range of possible clock speeds is observed by individuals 
who use special motherboards to overclock their computers. 

Most power-aware scheduling algorithms use the model proposed by Yao et al. |23| . in which 
the processor can run at any non- negative speed and power = speed" for some constant a > 1. In 
this model, the energy required to run job Jj at speed a is WiU a ~ l since the running time is Wi/a. 
This relationship between power and speed comes from an approximation of a system's switching 
loss, the energy consumed by logic gates switching values. 

Most of our results do not assume a specific relationship between power and speed. Except 
where otherwise stated, we just assume that power is a continuous, strictly-convex function of 
processor speed. Formally, strict convexity means that the line segment between any two points 
on the power/speed curve lies above the curve except at its endpoints. More intuitively, strict 
convexity means that power increases super-linearly with speed. The power function is strictly 
convex when a > 1 if power = speed". 

To measure schedule quality, we use two classic metrics. Let Sf and Cf denote the start and 
completion times of job Jj in schedule A. Most of the paper focuses on minimizing the schedule's 
makespan, maxj Cf, the completion time of the last job. We also consider total flow, the sum over 
all jobs of Cf — ri, the time between the release and completion times of job J;. 

Either of these metrics can be improved by using more energy to speed up the last job so 
the goals of low energy consumption and high schedule quality are in opposition. Thus, power- 
aware scheduling is a bicriteria optimization problem and our goal becomes finding non- dominated 
schedules, those such that no schedule can be both better and use less energy. A common approach 
to bicriteria problems is to fix one of the parameters. In power-aware scheduling, this gives two 
interesting special cases. If we fix energy, we get the laptop problem, which asks "What is the best 
schedule achievable using a particular energy budget?". Fixing schedule quality gives the server 
problem, which asks "What is the least energy required to achieve a desired level of performance?" . 

This paper considers both uniprocessor and multiprocessor scheduling. In the multiprocessor 
setting, we assume that the processors have a shared energy supply. This corresponds to scheduling 
a laptop with a multi-core processor or a server farm concerned only about total energy consumption 
and not the consumption of each machine separately. 

Results Our results in power-aware scheduling are the following: 

• For uniprocessor makespan, we give an algorithm to find all non-dominated schedules. Its 
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running time is linear once the jobs are sorted by arrival time. 

• We show that there is no exact algorithm for uniprocessor total flow using arithmetic opera- 
tions and the extraction of fc th roots. This holds even with equal-work jobs. 

• For a large class of "reasonable" scheduling metrics, we show how to extend uniprocessor 
algorithms to the multiprocessor setting with equal-work jobs. Using this technique, we give 
an exact algorithm for multiprocessor makespan of equal-work jobs and an arbitrarily-good 
approximation for multiprocessor total flow of equal-work jobs. 

• We prove that multiprocessor makespan is NP-hard if jobs require different amounts of work, 
even when all jobs arrive immediately. 

The rest of the paper is organized as follows. Section |21 describes related work. Section 
gives the uniprocessor algorithm for makespan. Section |I] shows that total flow cannot be exactly 
minimized. Section extends the uniprocessor results to give multiprocessor algorithms for equal- 
work jobs and shows that general multiprocessor makespan is NP-hard. Finally, Section discusses 
future work. 

2 Related work 

The work most closely related to ours is due to Uysal-Biyikoglu, Prabhakar, and El Gamal |2Uj . 
who consider the problem of minimizing the energy of wireless transmissions. This application 
has a totally different power function from those occurring in dynamic voltage scaling, but their 
algorithms only rely on the power function being continuous and strictly convex. They give a 
quadratic-time algorithm for the server version of makespan. Thus, our algorithm runs faster and 
also finds all non-dominated schedules rather than just solving the server problem. 

El Gamal et al. consider the wireless transmission problem when the packets have different 
power functions, giving an iterative algorithm that converges to an optimal solution. They also 
show how to extend their algorithm to handle the case when the buffer used to store active packets 
has bounded size and the case when packets have individual deadlines. Their algorithm can also be 
extended to schedule multiple transmitters, but this does not correspond to a processor scheduling 
problem. 

Pruhs, van Stee, and Uthaisombut ^Jj consider the laptop problem version of minimizing 
makespan for jobs having precedence constraints where all jobs are released immediately and 
power = speed". Their main observation, which they call the power equality, is that the sum 
of the powers of the machines is constant over time in the optimal schedule. They use binary 
search to determine this value and then reduce the problem to scheduling on related fixed-speed 
machines. Previously-known [SI [7] approximations for the related fixed-speed machine problem 
then give an 0(log 1+2//a m)-approximation for power-aware makespan. This technique cannot be 
applied in our setting because the power equality does not hold for jobs with release dates. 

Minimizing the makespan of tasks with precedence constraints has also been studied in the 
context of project management. Speed scaling is possible when additional resources can be used to 
shorten some of the tasks. Pinedo |15j gives heuristics for some variations of this problem. 

The only previous power-aware algorithm to minimize total flow is by Pruhs, Uthaisombut, and 
Woeginger |TJj|, who consider scheduling equal- work jobs on a uniprocessor. In this setting, they 



3 



observe that jobs can be run in order of release time and then prove the following relationships 
between the speed of each job in the optimal solution: 

Theorem 1 Let J±, J2, ■ ■ ■ , J n be equal-work jobs ordered by release time. In the schedule 

OPT minimizing total flow for a given energy budget where power = speed , the speed o~i of job Ji 
(for i 7^ n) obeys the following: 

• If C° PT < r i+ i, then Oi = a n . 

. // CP PT > rt+1, then af = af +1 + < 

. // C P PT = r i+ i, then < < af < af +1 + 0%. 

These relationships, together with observations about when the optimal schedule changes con- 
figuration, give an algorithm based on binary search that finds an arbitrarily-good approximation 
for either the laptop or the server problem. In fact, they can plot the exact tradeoff between total 
flow and energy consumption for optimal schedules in which the third relationship of Theorem ^ 
does not occur. Our impossibility result in Section 0] shows that the difficulty caused by the third 
relationship cannot be avoided. 

The idea of power-aware scheduling was proposed by Weiser et al. wno use trace-based 
simulations to estimate how much energy could be saved by slowing the processor to remove idle 
time. Yao et al. |23j formalize this problem by assuming each job has a deadline and seeking the 
minimum-energy schedule that satisfies all deadlines. They give an optimal offline algorithm and 
propose two online algorithms. They show one is (2 Q_1 a a )-competitive, i.e. it uses at most 2 a ~ 1 a a 
times the optimal energy. Bansal et al. [3] analyze the other, showing it is a Q -competitive. Bansal 
et al. 4 also give another algorithm that is (2(a/(a — l)) a e Q )-competitive. 

Power-aware scheduling of jobs with deadlines has also been considered with the goal of min- 
imizing the CPU's maximum temperature. Bansal et al. [3] propose this problem and give an 
offline solution based on convex programming. Bansal and Pruhs 5 j analyze the online algorithms 
discussed above in the context of minimizing maximum temperature. 

A different variation is to assume that the processor can only choose between discrete speeds. 
Chen et al. [H] show that minimizing energy consumption in this setting while meeting all deadlines 
is NP-hard, but give approximations for some special cases. 

Another algorithmic approach to power management is to identify times when the processor or 
parts of it can be partially or completely powered down. Irani and Pruhs ^5] survey work along 
these lines as well as approaches based on speed scaling. 

3 Makespan scheduling for a single processor 

Our first result is an algorithm to find all non-dominated schedules for uniprocessor power- 
aware makespan. We begin by solving the laptop problem for an energy budget E. Let OPT be 
an optimal schedule for this problem, i.e. OPT has minimum makespan among schedules using 
energy E. 
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3.1 Algorithm for laptop problem 

To find OPT, we establish properties it must satisfy. (We omit formal proofs for most of the 
properties and merely describe the relevant ideas.) Our first property is due to Yao, Demers, and 
Shenker who observed that the speed does not change during a job or energy could be saved 
by running that job at its average speed. 

Lemma 2 (|23j) Each job runs at a single speed in OPT. 

We use af to denote the speed of job J, in schedule A, omitting the schedule when it is clear 
from context. 

The second property allows us to fix the order in which jobs are run. 
Lemma 3 Without loss of generality, OPT runs jobs in order of their release times. 

Lemma El holds because reordering jobs (without changing their speeds) so that a job runs 
before jobs released after it produces a legal schedule. To simplify notation, we assume the jobs 
are indexed so r\ < r<i < r% < . . . < r n . 

The third property is that OPT is not idle between the release of the first job and the completion 
of the last job. 

Lemma 4 OPT is not idle between the release of job J\ and the completion of job J n 

Lemma@]holds because slowing down the job running before a period of idle time saves energy, 
which can then be used to speed up the last job and reduce the makespan. 

Stating the next property requires a definition. A block is a maximal substring of jobs such that 
each job except the last finishes after the arrival of its successor. For brevity, we denote a block 
with the indices of its first and last jobs. Thus, the block with jobs Jj, Jj+i, . . . , Jj—i, Jj is block 
(i,j). The fourth property is the analog of Lemma El for blocks. 

Lemma 5 In OPT, jobs in the same block run at the same speed. 

Proof: If the lemma does not hold, we can find two adjacent jobs J, and J%+\ in the same block 
of OPT with cij ^ cjj+i. Let e be a positive number less than the amount of work remaining in 
job Ji at time r^+i. Consider changing the schedule by running e work of Jj at speed ai + \ and 
e work of Jj + i at speed <7j. Since the block contains the same amount of work at each speed, 
the makespan is unchanged and the same amount of energy is used. By construction, this change 
does not cause the schedule to violate any release times. Job Jj does not run at a constant speed, 
however, contradicting Lemma □ 

Lemma El shows that speed is a property of blocks. In fact, if we know how OPT is broken 
into blocks, we can compute the speed of each block. The definition of a block and Lemma rj] mean 
that block (i,j) starts at time r^. Similarly, block completes at time unless it is the last 
block. Thus, any block (i,j) other than the last runs at speed (X^L=» w k)/{fj+\ — Tj). To compute 
the speed of the last block, we subtract the energy used by all the other blocks from the energy 
budget E. We choose the speed of the last block to exactly use the remaining energy. 

Using the first four properties, an 0(n 2 )-time dynamic programming algorithm can find the 
best way to divide the jobs into blocks. To improve on this, we establish the following restriction 
on allowable block speeds: 
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Lemma 6 The block speeds in OPT are non- decreasing. 

Proof: Suppose to the contrary that OPT runs a block (i, j) faster than block (J + 1, k). Let e > 
be less than the amount of work in either block. We modify the schedule by running e of the work 
in each block at the other block's speed. This does not change when the pair of blocks complete or 
how much energy they consume since the same amount of work is run at each speed. The modified 
schedule is valid since no job starts earlier than in OPT. Thus, we have created another optimal 
schedule, but it runs block (i,j) at two speeds, contradicting either Lemma [21 or Lemma |5J □ 

It turns out that OPT is the only schedule having all the properties given by Lemmas I2H01 
Lemma 7 For any energy budget, there is a unique schedule having the following properties: 

1. Each job runs at a single speed 

2. Jobs are run in order of release time 

3. It is not idle between the release of job J\ and the completion of job J n 
4- Jobs in the same block run at the same speed 

5. The blocks speeds are non- decreasing 

Proof: Suppose to the contrary that A and B are different schedules obeying all five properties 
and consuming the same amount of energy. Since each schedule is determined by its blocks, A and 
B must have different blocks. Without loss of generality, suppose the first difference occurs when 
job J, L is the last job in its block for schedule A but not for schedule B. We claim that every job 
indexed at least i runs slower in schedule B than in schedule A. Since energy consumption increases 
with speed, this implies that schedule B uses less energy than schedule A, a contradiction. 

In fact, we prove the strengthened claim that every job indexed at least i runs slower and 
finishes later in schedule B than in schedule A. First, we show this holds for job Jj. Job Jj ends 
its block in schedule A but not in schedule B so Cf > r^i = Cf. Since each schedule begins the 
block containing job Jj at the same time and runs the same jobs before job Jj, job Jj runs slower 
in schedule B than schedule A. 

Now we assume that the strengthened claim holds for jobs indexed below j and consider job Jj. 
Since each job Jj, . . . , Jj-i finishes no earlier than its successor's release time in schedule A, each 
finishes after its successor's release time in schedule B. Thus, none of these jobs ends a block in 
schedule B and schedule B places jobs Jj and Jj in the same block, which implies of = of . Speed 
is non-decreasing in schedule A so of < of. Therefore, of = of < of < of so job Jj runs slower 
in schedule B than in schedule A. Job Jj also finishes later because job Jj-i finishing later implies 
that job Jj starts later. □ 

Because only OPT has all five properties, we can solve the laptop problem by finding a schedule 
with the properties. For this task, we propose an algorithm IncMerge. This algorithm maintains 
a tentative list of blocks, initially empty. Each block knows its speed, calculated as described above 
from the release time of the next job (including jobs not yet added to the schedule) or the energy 
budget. Jobs are added to the schedule one at a time in order of their release times. When a new 
job is added, it starts in its own block. Then, while the last block runs slower than its predecessor, 
the last two blocks are merged. Assuming the input is already sorted by release time, IncMerge 
runs in 0(n) time since each job ceases to be the first job of a block once. 
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3.2 Finding all non-dominated schedules 

A slight modification of IncMerge finds all non-dominated schedules. Intuitively, the modified 
algorithm enumerates all optimal configurations (i.e. ways to break the jobs into blocks) by starting 
with an "infinite" energy budget and gradually lowering it. To start this process, run IncMerge as 
above, but omit the merging step for the last job, essentially assuming the energy budget is large 
enough that the last job runs faster than its predecessor. To find each subsequent configuration 
change, calculate the energy budget at which the last two blocks merge. Until this value, only the 
last block changes speed. Thus, we can easily find the relationship between makespan and energy 
consumption for a single configuration and the curve of all non-dominated schedules is constructed 
by combining these. The curve for an instance with three jobs and power = speed 3 is plotted in 
Figure n The configuration changes occur at energy 8 and 17, but they are not readily identifiable 
from the figure because the makespan/energy curve is always continuous and has a continuous 
first derivative for this power function. Higher derivatives are discontinuous at the configuration 
changes. Figures El and El show the first and second derivatives. 



4 Impossibility of exactly minimizing flow 

We have completely solved uniprocessor power-aware makespan by showing how to compute all 
non-dominated schedules, forming a curve such as Figure ^ The previous work on power-aware 
scheduling for total flow includes a similar figure, but that figure omits parts of the curve where 
the optimal schedule finishes one job exactly as another is released. We now show that these gaps 
cannot be filled exactly. 

Theorem 8 If power = speed 3 , there is no exact algorithm to minimize total flow for a given 
energy budget using operations +, — , x, /, and the extraction of roots, even on a uniprocessor with 
equal-work jobs. 
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Figure 2: Relationship between energy and 1st derivative of makespan in non-dominated schedules 
for instance with n = 0, w\ = 5, T2 = 5, u>2 = 2, r% = 6, W3 = 1, and power = speed 3 . 
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Figure 3: Relationship between energy and 2nd derivative of makespan in non-dominated schedules 
for instance with n = 0, w\ = 5, T2 = 5, W2 = 2, r% = 6, W3 = 1, and power = speed 3 . 
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Proof: We show that a particular instance cannot be solved exactly. Let jobs J\ and J 2 arrive at 
time and job J3 arrive at time 1, each requiring one unit of work. We seek the minimum-flow 
schedule using 9 units of energy. Again we use Oi to denote the speed of job Jj. Thus, 

al + aj + a 2 3 = 9. (1) 

For energy budgets between approximately 8.43 and approximately 11.54, the optimal solution 
finishes job J 2 at time 1. Therefore, 

T + T = 1 (2) 

and Theorem H gives us that 

al = a\ + a\. (3) 
Substituting Equation (j2j) into Equations (JU and (j3J), followed by algebraic manipulation gives 

2af - \2a\ l + Serf + 108<rf - 159crf - 738<r| + 2415a| 
-1026of - 5940<rl + 12150a? - 10449a? + 4374a 2 - 729 = 0. 



According to the GAP system JH|; the Galois group of this polynomial is not solvable. This implies 
the theorem by a standard result in Galois theory (cf. jlOl pg. 542]). We owe the idea for this type 
of argument to Bajaj □ 

Since an arbitrarily-good approximation algorithm is known for total flow, one interpretation 
of Theorem El is that exact solutions do not have a nice representation. For most applications, the 
approximation is sufficient since finite precision is the normal state of affairs in computer science. 
Certainly, it could be used to draw an approximate curve for the gaps in the flow analog of Figure^ 
Only an exact algorithm such as IncMerge can give closed-form solutions suitable for symbolic 
computation, however. 



5 Multiprocessor scheduling 

Now we consider multiprocessor power-aware scheduling. In a non-dominated schedule, the 
processors are related by the following observations: 

1. For makespan, each processor must finish its last job at the same time or slowing the processors 
that finish early would save energy. 

2. For total flow, each processor's last job runs at the same speed or running them at the average 
speed would save energy. 

Using these observations, slight modifications of IncMerge and the total flow algorithm of Pruhs 
et al. fni] can solve multiprocessor problems once the assignment of jobs to processors is known. 

We show how to assign equal-work jobs to processors for scheduling metrics with two properties. 
A metric is symmetric if it is not changed by permuting the job completion times. A metric is non- 
decreasing if it does not decrease when any job's completion time increases. Both makespan and 
total flow have these properties, but some metrics do not. One example is total weighted flow, 
which is not symmetric. 
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To prove our results, we need some notation. For schedule A and job Jj, let proc (i) denote 
the index of the processor running job Jj and succ^(i) denote the index of the job run after Jj 
on processor proc^(i). Also, let after" 4 (i) denote the portion of the schedule running on processor 
proc^(i) after the completion of job Jj, i.e. the jobs running after job Jj together with their start 
and completion times. We omit the superscript when the schedule is clear from context. 

We begin by observing that job start times and completion times occur in the same order. 

Lemma 9 // OPT is an optimal schedule for equal-work jobs under a symmetric non- decreasing 
metric, then S° PT < Sf PT implies C° PT < Cf PT . 

Proof: Suppose to the contrary that Sf PT < Sf PT and Cf PT > Cf PT . Clearly, jobs Jj and Jj 
must run on different machines. We create a new schedule OPT' from OPT. All jobs on machines 
other than proc(i) and proc(j) are scheduled exactly the same, as are those that run before jobs 
J, and Jj. We set the completion time of job Jj in OPT' to C® PT and the completion time of job 
Jj in OPT' to Cj- PT . We also switch the suffixes of jobs following these two, i.e. run after(i) on 
processor proc(j) and run after (J) on processor proc(i). Job Jj still has positive processing time 
since 

S OPT' = S OPT < s OPT < c OPT = c OPT>_ (The 

processing time of job Jj increases so it is 
also positive.) Thus, OPT' is a valid schedule. The metric values for OPT and OPT' are the same 
since this change only swaps the completion times of jobs Jj and Jj. 

We complete the proof by showing that OPT' uses less energy than OPT. Since the power 
function is strictly convex, it suffices to show that both jobs have longer processing time in OPT' 
than job Jj did in OPT. Job Jj ends later so its processing time is clearly longer. Job Jj also has 
longer processing time since runs throughout the time OPT runs job Jj, but starts earlier. □ 

Using Lemma |§1 we prove that an optimal solution exists with the jobs distributed in cyclic 
order, i.e. job Jj runs on processor (i mod m) + 1. 

Theorem 10 There is an optimal solution for equal-work jobs under any symmetric non- decreasing 
metric with the jobs distributed in cyclic order. 

Proof: Suppose to the contrary that no optimal schedule distributes the jobs in cyclic order. Let 
i be the smallest value such that no optimal schedule distributes jobs Ji, J2, ■ ■ ■ , J% in cyclic order 
and let OPT be an optimal schedule that distributes the first i — 1 jobs in cyclic order. To simplify 
notation, we create dummy jobs J_( m _i), J_( m _ 2 ), • • • , Jo, with job J_( m _j) assigned to processor i. 
By assumption, succ(i — m)^i. Let J/ be the job such that succ(Z) = i, i.e. the job preceeding job 
Jj. Since the first i — 1 jobs are distributed in cyclic order, if we assume (without loss of generality) 
that jobs starting at the same time finish in order of increasing index, then Lemma |^ implies that 
C° P J < C° PT . (Details omitted.) 

To complete the proof, we consider 3 cases. In each, we use OPT to create an optimal schedule 
assigning job Jj to processor (i mod m) + 1, contradicting the definition of i. 

Case 1: Suppose no job follows job Jj_ m . We modify the schedule by moving after (I) to follow 
Jj_ m on processor (i mod m) + 1. Since C° pp < C° PT and after(Z) was able to follow job J/, 
it can also follow job Jj- m . The resulting schedule has the same metric value and uses the same 
energy so it is also optimal. 

Case 2: Suppose Jj_ m is not the last job assigned to processor proc(i — m) and Cf PT < 
r succ(i-m)- We extend the cyclic order by swapping after (I) and after (i — m). This does not change 
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the amount of energy used. To show that it gives a valid schedule, we need to show that jobs J/ and 
Ji-m complete before after (i — m) and after (I). Job J/ ends by time 5^w$_ m ) by the assumption 
that Cf PT < r succ(i „ m) . Job Jj_ m ends by time since < Cf PT . 

Case 3: Suppose Jj_ m is not the last job assigned to processor proc(z — m) and Cf PT > 
r succ(i~m)- I n this case, we swap the jobs J sncc (i- m ) an d ^succ(Z) = Jh but leave the schedules the 
same. In other words, we run job J succ (i-m) from time S® u PP ^ to time C^Jn on processor proc(Z) 
and we run job J succ (i) from time S° n P ^_^ to time C£^[j_ m ) on processor proc(z - m). The 
schedules have the same metric value and each uses the same amount of energy. To show that we 
have created a valid schedule, we need to show that jobs J succ (i-m) anci J B ucc(l) are each released by 
the start time of the other. Job J sncc (i-m) was released by time since Cf PT > r sncc ^_ m y 

Since a job with index greater than i follows job Ji- m , ?"i = r succ m < r succ ^_ m ) an d Job J succ q) was 
released by time S OPT r □ 

A simpler proof suffices if we specify the makespan metric since then OPT has no idle time. 
Thus, r sncc (i_ m ) < C® p ^ < Cf PT and case 2 is eliminated. 

Theorem I1UI allows us to solve multiprocessor makespan for equal- work jobs. Unfortunately, the 
general problem is NP-hard. 

Theorem 11 Nonpreemptive power-aware multiprocessor makespan is NP-hard, even when all jobs 
arrive immediately. 

Proof: We give a reduction from Partition [12] : 

Partition: Given a multiset A = {01,02, ••• ,a n }, does there exist a partition of A 
into A\ and A 2 such that X^eAj a « = Sa;eA 2 a i > - 

Let B = Y27=l a «- ^ e assu me B is even since otherwise no partition exists. We create a 
scheduling problem from an instance of Partition by creating a job Ji for each cij with rj = 
and Wi = Oj. Then we ask whether a 2-processor schedule exists with makespan B/2 and a power 
budget allowing work B to run at speed 1. 

From a partition, we can create a schedule where each processor runs the jobs corresponding to 
one of the A4 at speed 1. For the other direction, the convexity of the power function implies that 
all jobs run at speed 1 so the work must be partitioned between the processors. □ 

Pruhs et al. observed that the special case of all jobs arriving immediately has a PTAS 
based on load balancing work by Alon et al. [2] on minimizing the L a norm of loads. 

6 Future work 

The study of power-aware scheduling algorithms is just beginning so there are many possible 
directions for future work. We consider the most important to be finding online algorithms with 
performance guarantees for makespan or total flow. No such algorithms are currently known, but 
many scheduling applications occur in the online setting. Our results on the structure of optimal 
solutions may help with this task, but the problem seems quite difficult. If the algorithm cannot 
know when the last job has arrived, it must balance the need to run quickly to minimize makespan 
if no other jobs arrive against the need to conserve energy in case more jobs do arrive. 
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We would also like to see theoretical research using models that more closely resemble real 
systems. With this objective, we have been investigating actual implementations of dynamic voltage 
scaling. The most obvious feature of real systems differing from the standard model is that the 
speed has discrete settings rather than being a continuous variable. Imposing minimum and/or 
maximum speeds is one way to partially incorporate this aspect of real systems without going all 
the way to the discrete case. Another feature of real systems is that slowing down the processor has 
less effect on memory-bound sections of code since part of the running time is caused by memory 
latency. There is already some simulation-based work attempting to exploit this phenomenon |22j . 
Finally, real systems incur overhead to switch speeds because the processor must stop while the 
voltage is changing. This overhead is fairly small, but discourages algorithms requiring frequent 
speed changes. We have begun considering models incorporating some of these changes in the hope 
of finding one that more closely reflects real systems while remaining mathematically tractable. 
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